翻訳と辞書 |
List of text corpora : ウィキペディア英語版 | List of text corpora Following is a list of text corpora in various languages. "Text corpora" is the plural of "text corpus". A text corpus is a large and structured set of texts (nowadays usually electronically stored and processed). Text corpora are used to do statistical analysis and hypothesis testing, checking occurrences or validating linguistic rules within a specific language territory. == English language ==
*(Google N-Grams Corpus ) – Largest English corpus at 155 billion words.〔Professor Mark Davies at BYU created an online tool to search Google's English language corpus, drawn from Google Books, at http://googlebooks.byu.edu/x.asp.〕 Also has corpora for other languages. To download datasets of this corpus, see 〔(【引用サイトリンク】title=Google Ngram Viewer )〕 *American National Corpus *Bank of English *British National Corpus *Corpus Juris Secundum *Corpus of Contemporary American English (COCA) 425 million words, 1990–2011. Freely searchable online. *Brown Corpus, forming part of the "Brown Family" of corpora, together with LOB, Frown and F-LOB. *International Corpus of English *Oxford English Corpus *Scottish Corpus of Texts & Speech *Corpus Resource Database (CoRD), more than 80 English language corpora.〔(【引用サイトリンク】url=http://www.helsinki.fi/varieng/CoRD/corpora/ )〕
抄文引用元・出典: フリー百科事典『 ウィキペディア(Wikipedia)』 ■ウィキペディアで「List of text corpora」の詳細全文を読む
スポンサード リンク
翻訳と辞書 : 翻訳のためのインターネットリソース |
Copyright(C) kotoba.ne.jp 1997-2016. All Rights Reserved.
|
|